In this study we investigated virus-virus and vector-virus interactions by combining two approaches: (1) meta-transciptomic analysis and (2) gene-silencing experiments.
In the meta-transcriptomic analysis we looked at the overall viral landscape in the different libraries, detected vector modules (co-expressing genes using gene-network analysis), and correlated the viruses’ abundance to the vector modules. Last, we tested if the virus abundance matrix can predict the virus-vector interaction.
In this R markdown document we present the meta-transcriptomic analysis step by step in a way that can be reproduced using the input files available on the GitHub web-page. The following chunks are ordered as the sections in the Materials and Methods of the manuscript “Vector-virus interactions in multi-viral infection”, Eliash et al. (___). All of the reffered tables and figures can be found in the Manuscript.
We searched for the term “varroa” in the SRA databases (NCBI, January 2020) with the following filters:
The reads were mapped to both available varroa genome (Vdes_3.0, accession number: GCF_002443255, (Tehcer et al. 2019)), and to the genomes of 23 selected viruses (table 1). The alignment and estimation of transcript and virus abundances were done using Kallisto (Bray et al. 2016) (version 0.46.1 with default options). The abundances were outputted in transcripts per million (TPM) units.
Load libraries
library("dplyr")
library("tidyverse")
library("vegan")
library("DESeq2")
library("ggfortify")
library("WGCNA")
options(stringsAsFactors = FALSE) # Allow multi-threading within WGCNA. This helps speed up certain calculations.
library("rmarkdown")
library("knitr") # for the markdown
library("kableExtra") # for creating a scrolling table
library("ggplot2") # for plotting
library("ape") # for mantel.test
library("Biostrings")
library("ggrepel") # for spreading text labels on the plot
library("scales") # for axis labels notation
library("GO.db") # for GO term annotation
library("reshape2")
library("RSQLite")
library("AnnotationDbi") # for GO term annotation
library("GSEABase")
library("GOstats")
library("maps") # for the map background
library("htmltools")
library("rgdal")
library("grid")
library("gridExtra")
library("GeneOverlap") # for making the overlapping genes
library("cluster")
library("rmdformats")
library("corrplot") # for virus-virus correlation
library("viridis")
library("hrbrthemes")
library("ggthemes")
library("RColorBrewer")
library("naniar")
knitr::opts_chunk$set(echo = TRUE)
setwd("/Users/nuriteliash/Documents/GitHub/varroa-virus-networks") #set the directory as the local GitHub local repository
The mapped libraries were used to create the initial data frame, in which each column is a SRR library, and each row is an isoform. The cells contain the isoform TPM.
The output data frame contains 35,669 isoforms of the selected 71 varroa libraries. The first 35,641 rows are varroa genes, and the last 28 rows are reads that matched to the selected viruses (for viruses’ details, see Table 1).
# First make a function for making the mapped reads (output of kallisto) into a data frame
read_kallisto <- function(filename) {
sampleName <- sub("data/kallisto/(.*).tsv.gz","\\1", filename)
return(read_tsv(filename) %>%
dplyr::select(!!sampleName := tpm))
}
# now make the data frame, which contain all 71 libraries:
df_71 <- list.files(path = "data/kallisto", full.names = TRUE) %>%
lapply(read_kallisto) %>%
bind_cols()
# add a column "target_id" with the isoform/virus ID
df_71$target_id <- list.files(path = "data/kallisto", full.names = TRUE)[1] %>%
read_tsv() %>%
dplyr::select(target_id) %>%
pull()
#save(df_71, file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/kallisto_71.rds")
# Next, we join isoforms that belong to the same gene.
# Load the data frame created in the former chunk
load(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/kallisto_71.rds")
# Import the varroa transcripts ("target_id") and their corresponding gene ("gene_id").
varroa_isoforms <- read_tsv("/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/gene2isoform.txt.gz", col_names = c("gene_id", "target_id"))
# Join the varroa transcripts (varroa_isoforms), and the library tpm (df), by the isoform ID ("target_id"). As we do "left_join", the final data frame of "gene_tpm" contain ONLY varroa genes, excluding viruses' tpm.
gene_tpm_71 <- left_join(varroa_isoforms, df_71, by = "target_id")
# Collapse isoforms ("target_id") to a single row of a gene ("gene_id"), and sum the tpm(s) per gene per library
gene_tpm_collps_71 <- gene_tpm_71 %>%
gather("library","tpm", -target_id, -gene_id) %>%
group_by(gene_id, library) %>%
summarise(gene_tpm_71 = sum(tpm))
# spread the table again, by library
final_gene_tpm_71 <- spread(gene_tpm_collps_71, key = "library", value = "gene_tpm_71") %>% column_to_rownames('gene_id')
# this table will be used for all further analyses
# save(final_gene_tpm_71, file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/final_gene_tpm_71.rds")
Next, we perform Principle Component Analysis (PCA) on varroa genes, to detect outlierd libraries
# load the table
load(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/final_gene_tpm_71.rds")
# Before PCA, we transpose the table "final_gene_tpm_71", and transform (log10+0.000001)
final_gene_tpm_71_T<- transposeBigData(log10(final_gene_tpm_71 + 0.000001))
PCA_71 <- prcomp(final_gene_tpm_71_T)
p71 <- autoplot(PCA_71, label = TRUE, x = 1, y = 2)+
ggtitle("a. PCA of 71 libraries based on gene expression")
# Five libraries are obvious outliers: "SRR5109825", "SRR5109827", "SRR533974" , "SRR3927496", "SRR8867385".
final_gene_tpm_66_T <- final_gene_tpm_71_T %>%
rownames_to_column("library") %>%
dplyr::filter(!(library %in% c("SRR5109825", "SRR5109827", "SRR533974" , "SRR3927496", "SRR8867385"))) %>% column_to_rownames("library")
PCA_66 <- prcomp(final_gene_tpm_66_T)
p66 <- autoplot(PCA_66, label = TRUE, x = 1, y = 2)+
ggtitle("b. PCA of 66 libraries based on gene expression")
# plot the two PCA plots side by side:
par(mar = c(4, 4, .1, .1))
p71
p66
# when we remove them and repeat the PCA (Fig S2b), the library scatter more homogeneously.
Figure S1. PCA of varroa SRA libraires based on their genes TPM. a. All 71 libraries. The outlier libraries can be seen on the left side of the plot. These 5 libraries were excluded from further analysis. b. the final 66 libraries used for the analysis.
After filtering outlier libraries (based on PCA analysis), we used the left 66 SRAs belonging to 9 studies of varroa RNAseq for further analyses (details of the final libraries available in Table S5).
# We remove the five outlier libraries from the data frame, and save for subsequent analyses
load(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/kallisto_71.rds")
df <- df_71 %>% dplyr::select(-c("SRR5109825", "SRR5109827", "SRR533974" , "SRR3927496", "SRR8867385"))
# In addition, we save the final varroa genes TPM values per library, of the filtered 66 libraries, for the Gene Network analysis.
for_modules <- final_gene_tpm_71 %>%
dplyr::select(-c("SRR5109825", "SRR5109827", "SRR533974" , "SRR3927496", "SRR8867385")) %>%
transposeBigData()
#save(for_modules, file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/for_modules.rds")
We used a network analysis approach to identify groups of genes that share a similar expression pattern across a large set of available varroa transcriptomic data (RNAseq). To construct the gene-network, a weighted gene co-expression analysis was carried out using the WGCNA package in R, following the authors’ tutorial (Langfelder and Horvath 2016; Langfelder and Horvath 2008) . The gene network analysis included 4 main steps: (1) Network construction and module detection; (2) Correlating modules to external information, the varroa viral load; (3) Identifying important genes, ‘hub-genes’; and (4) GO-term enrichment analysis for varroa modules which interact with the viral load.
A gene co-expression network identify genes that interact with one another based on common expression profiles (Barabási and Oltvai 2004). Groups of co-expressed genes that have similar expression patterns across samples are identified using hierarchical clustering and are placed in gene ‘modules’ (Miller, Horvath, and Geschwind 2010). Weighted gene co-expression analysis was conducted using the WGCNA package in R (Langfelder and Horvath 2008).
Following the steps in WGCNA tutorial: “Automatic, one-step network construction and module detection”
For the varroa Gene network analysis we gonna use the “for_module” file created in the chunk “PCA” of section “Load data and library filtration”. This data frame is the initial input for all the WGCNA, and contains the varroa genes TPM, where columns are genes’ ID and rows are the RNAseq libraries
load(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/for_modules.rds")
# Take a look at the data frame (print 5 genes in 10 libraries):
head(for_modules[,1:5], 10)
## 111242787 111242788 111242789 111242790 111242792
## SRR3632582 16.69060 13115.3000 1.632676 4.68461 39.04070
## SRR3633003 27.71050 68.2067 1.006694 8.47756 86.32650
## SRR3634700 30.52360 37.5802 1.105930 7.45524 45.91280
## SRR3634772 39.17820 246.4390 3.671050 9.14239 48.84960
## SRR3634929 12.14060 74095.9000 9.223820 3.01799 20.93230
## SRR3634942 22.61910 12.7722 2.068261 7.35609 51.97150
## SRR3635001 34.69800 19.2896 1.312841 10.61080 58.61030
## SRR3635050 11.08990 85335.2000 9.115974 2.16917 17.92070
## SRR3635105 29.92650 253.0850 2.128474 7.89757 50.31910
## SRR3927486 7.45717 3993.3700 1.491740 1.11916 7.85349
# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=25, by=2))
# Call the network topology analysis function
sft = pickSoftThreshold(for_modules, powerVector = powers, verbose = 5)
## pickSoftThreshold: will use block size 4366.
## pickSoftThreshold: calculating connectivity for given powers...
## ..working on genes 1 through 4366 of 10247
## ..working on genes 4367 through 8732 of 10247
## ..working on genes 8733 through 10247 of 10247
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.84900 1.9000 0.807 3610.0 3730.0 5270
## 2 2 0.54000 0.6600 0.894 1910.0 1920.0 3460
## 3 3 0.00867 -0.0596 0.507 1200.0 1150.0 2540
## 4 4 0.22700 -0.3920 0.527 826.0 820.0 1970
## 5 5 0.35500 -0.6060 0.599 603.0 603.0 1580
## 6 6 0.42100 -0.7390 0.652 459.0 446.0 1300
## 7 7 0.45200 -0.8360 0.675 359.0 337.0 1080
## 8 8 0.49200 -0.8910 0.695 288.0 257.0 916
## 9 9 0.49900 -0.9460 0.682 236.0 199.0 787
## 10 10 0.51500 -0.9400 0.646 196.0 157.0 684
## 11 12 0.88600 -0.8650 0.973 141.0 99.9 557
## 12 14 0.87200 -1.0800 0.922 106.0 66.2 521
## 13 16 0.87100 -1.2100 0.871 82.4 44.8 491
## 14 18 0.86000 -1.2900 0.824 65.9 31.3 465
## 15 20 0.85000 -1.3200 0.809 53.9 22.2 442
## 16 22 0.84800 -1.3300 0.822 45.1 16.2 422
## 17 24 0.84000 -1.3200 0.840 38.4 11.9 404
# Plot the results:
sft_df <- data.frame(
Power = sft$fitIndices[,1],
sft_signedR2 = -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
mean.k = sft$fitIndices[,5])
# Scale-free topology fit index as a function of the soft-thresholding power
pSI <- ggplot(data = sft_df, aes(x = Power, y = sft_signedR2, label = Power)) +
xlab("Soft Threshold (power)") +
ylab("Scale Free Topology Model Fit,signed R^2") +
ggtitle("Scale independence") +
theme_classic() +
theme(text = element_text(size=16)) +
geom_text(data = sft_df, aes(x = Power, y = sft_signedR2), size=6) +
geom_hline(yintercept = 0.8, col="red") # this line corresponds to using an R^2 cut-off of h
# Mean connectivity as a function of the soft-thresholding power
pMC <- ggplot(data = sft_df, aes(x = Power, y = mean.k, label = powers)) +
xlab("Soft Threshold (power)") +
ylab("Mean Connectivity") +
ggtitle("Mean Connectivity") +
theme_classic() +
theme(text = element_text(size=16)) +
geom_text(data = sft_df, aes(x = Power, y = mean.k), size=6)
par(mar = c(4, 4, .1, .1))
pSI
pMC
Figure S2. Network construction using 10,247 genes of 66 SRA varroa libraries. a. picking soft threshold.
net = blockwiseModules(for_modules, power = 12,
TOMType = "unsigned", minModuleSize = 30,
reassignThreshold = 0, mergeCutHeight = 0.25, #0.25 means a correlation of 0.75
numericLabels = TRUE, pamRespectsDendro = FALSE,
#saveTOMs = TRUE,
#saveTOMFileBase = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/Varroa_modulesTOM",
verbose = 3)
# To see how many modules were identified and what the module sizes are, one can use table(net$colors).
table(net$colors)
#saveRDS(net, "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/net.rds")
Based on the analysis of network topology, for the construction of the network we set our threshold for merging of modules to 0.25, minimum number of 30 genes per module, and the power β of 12. This power is the lowest for which the scale-free topology fit index curve flattens out upon reaching a high value, in this case, when Rsq reaches 0.886 (Fig S2a). We then performed hierarchical clustering of the genes based on topological overlap (sharing of network neighborhood) to identify groups of genes who coexpressed across libraries, these are the network modules (fig S2b).
The hierarchical clustering dendrogram (tree) used for the module identification is returned in net$dendrograms[[1]]; The dendrogram can be displayed together with the color assignment using the following code
net <- readRDS("/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/net.rds")
# Convert labels to colors for plotting
mergedColors = labels2colors(net$colors)
# Plot the dendrogram and the module colors underneath
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],
"Module colors",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
Figure S2. Network construction using 10,247 genes of 66 SRA varroa libraries. b. Hierarchical clustering dendrogram using merge cut height of 25%, revealing 15 co-expressed genes modules. Each branch of the dendrogram represents a single gene, and the colored bar below denotes its corresponding module, as annotated in the legend to the right. The dendrogram height is the distance between genes.
Save the module assignment and module eigengene information necessary for subsequent analysis:
moduleLabels = net$colors
moduleColors = labels2colors(net$colors)
MEs = net$MEs;
geneTree = net$dendrograms[[1]];
#save(MEs, moduleLabels, moduleColors, geneTree, file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/Varroa_modules_networkConstruction-auto.RData")
The important genes were identified according to their Module membership (Langfelder and Horvath 2008), and their annotation (based on sequence similarity to homologue genes). A gene Module membership expresses its connection to other genes in the module, and therefore, measures the extent to which this gene represents the overall module. The Module membership is calculated by Pearson correlation of the module eigengene and the gene expression. Genes with high Module membership and relevant annotation are likely to play a role in the vector-virus interaction, and are good candidates for later experimental validation.
# names (colors) of the modules
modNames = substring(names(MEs), 3)
# virusNames = substring(names(viruses_load_15), 1)
#make a table of the Module-membership ("MM") of each gene (which is its correlation coefficient, pearson)
geneModuleMembership_66 = as.data.frame(cor(for_modules, MEs, use = "p"));
MMPvalue_66 = as.data.frame(corPvalueStudent(as.matrix(geneModuleMembership_66), nSamples));
### Controlling the false discovery rate: Benjamini–Hochberg procedure ###
# using p.adjust function, for all comparisons, 15 modules and 15 viruses (m=225).
# first make the p-value matrix into a dataframe
MMPvalue_66_0 <- as.data.frame(MMPvalue_66)
# then "gather" all the p-values, so they will appear in one column
longer_Pvalue <- MMPvalue_66_0 %>%
rownames_to_column("module") %>%
gather("virus", "pvalue", -module)
# now calculate the p.adjust for each p-value
Padjust <- p.adjust(longer_Pvalue$pvalue, method = "fdr")
# and add the column of adjusted pvalues
Padjust <- add_column(longer_Pvalue, Padjust)
# now spread it back
MMPadjust_66 <- Padjust %>%
dplyr::select(-pvalue) %>%
group_by(virus) %>%
pivot_wider(names_from = virus, values_from = Padjust)
MMPadjust_66 <- column_to_rownames(MMPadjust_66, "module")
#change the name of the columns to start with "MM" then the module name
names(geneModuleMembership_66) = paste("MM", modNames, sep="");
names(MMPadjust_66) = paste("padj.MM", modNames, sep="");
genePurple = data.frame(
moduleCol = moduleColors,
geneModuleMembership_66,
MMPadjust_66) %>%
dplyr::select(c(moduleCol, MMpurple, padj.MMpurple)) %>%
dplyr::filter(moduleCol == "purple") %>%
mutate(moduleNum = "Module.10") %>%
rownames_to_column("genes")
# add gene annotation:
# load the annotation file:
annot_varroa <- read_csv("/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/annot_varroa.csv", col_names = TRUE, )
# remove the "LOC" from the gene name
annot_varroa$Locus <- str_replace(annot_varroa$Locus, "LOC", '')
# change the col name to "genes", so it will the same as in the "overlap" table
colnames(annot_varroa)[which(names(annot_varroa) == "Locus")] <- "genes"
head(annot_varroa)
## # A tibble: 6 x 11
## Name Accession Start Stop Strand GeneID genes `Locus tag` `Protein produc…
## <chr> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr>
## 1 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672900.1
## 2 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672902.1
## 3 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672903.1
## 4 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672904.1
## 5 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672905.1
## 6 Un NW_01921… 3.38e7 3.39e7 - 1.11e8 1112… - XP_022672908.1
## # … with 2 more variables: Length <dbl>, Protein Name <chr>
genesModule.10 <- left_join(annot_varroa, genePurple, by = "genes") %>%
na.omit() %>%
dplyr::select("genes", "moduleNum", "MMpurple", "padj.MMpurple", "Accession", "Protein Name") %>%
dplyr::rename(c(MM = MMpurple, MMpadj = padj.MMpurple))
# there are total of 263 matching annotations, as some of the genes have a few isoforms.
# save the final table of the genes in module 10
write_csv(genesModule.10, "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/genesModule.10.csv")
*** NOT INCLUDED *** #### Calculating Gene Significanse (GS)
# save(geneTraitSignificance_66, GSPadjust_66, geneModuleMembership_66, MMPadjust_66, file = "/results/geneTraitANDgeneMM_66.RData")
# load the
annot.vd <- read.csv("/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/data/VdesGOready2.csv")
#Preparing the GO frame
annot.vd2 <- annot.vd %>%
mutate(evidence = "IEA") %>%
dplyr::select(go_id = GO.ids, evidence, gene = Gene.id)
head(annot.vd2)
## go_id evidence gene
## 1 GO:0005515 IEA 111242787
## 2 GO:0005576 IEA 111242789
## 3 GO:0008061 IEA 111242789
## 4 GO:0006030 IEA 111242789
## 5 GO:0005887 IEA 111242790
## 6 GO:0004872 IEA 111242790
goFrame.vd <-GOFrame(annot.vd2, organism = "Vd")
goAllFrame.vd <-GOAllFrame(goFrame.vd)
gsc.vd <-GeneSetCollection(goAllFrame.vd, setType = GOCollection())
#Preparing the universe
universe.vd <- as.character(unique(annot.vd2$gene)) # there's a wired thing in the GSEAGOHyperGParams function, sometimes its required the universe to be "character".
head(universe.vd)
## [1] "111242787" "111242789" "111242790" "111242792" "111242797" "111242798"
# Preparing the gene set (list of genes in a module)
# change "black" to the name of the desired module, in the first line: [moduleColors=="black"], and in the final "write.csv(file = "GO_term_enrichment_**salmon**BP.csv")
ME <- names(for_modules)[moduleColors=="black"]
ME_df <- data.frame(gene = ME)
genes.vd <- unique(ME_df$gene)
head(genes.vd)
## [1] "111242800" "111242844" "111242861" "111242884" "111242885" "111242988"
params.vd <- GSEAGOHyperGParams(name = "Vd_GO_enrichment",
geneSetCollection = gsc.vd,
geneIds = genes.vd,
universeGeneIds = universe.vd,
ontology = "BP", # change with MF, CC to test all
pvalueCutoff = 0.05,
conditional = F,
testDirection = "over")
over.vd <- hyperGTest(params.vd)
over.vd
## Gene to GO BP test for over-representation
## 321 GO BP ids tested (81 have p < 0.05)
## Selected gene set size: 39
## Gene universe size: 5301
## Annotation package: Based on a GeneSetCollection Object
#summary(over.vd)
GO_enrich.vd <- as.data.frame(summary(over.vd)) %>%
arrange(Pvalue)
# write.csv(GO_enrich.vd, file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/GO_term_black.csv")
To test if we can predict the virus-varroa interaction given the virus abundance, we used Mantel-test for correlation between two distance-matrices (Mantel 1967).
# load the two matrices:
# the module–trait association matrix:
load(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/moduleTraitCor_66.RData")
# and the viral abundance correlogram:
virusAbundCor_66 <- readRDS(file = "/Users/nuriteliash/Documents/GitHub/varroa-virus-networks/results/virusAbundCor_66.rds")
virusAbundCor_66 <- virusAbundCor_66$corr
# make correlation matrix of the "moduleTraitCor":
corModulTrait_66 <- cor(moduleTraitCor_66)
# (1) Mantel test using "ape" library:
mantel.test(corModulTrait_66, virusAbundCor_66, graph = TRUE,
main = "Mantel test",
xlab = "z-statistic", ylab = "Density",
sub = "The vertical line shows the observed z-statistic")
## $z.stat
## [1] 6.276091
##
## $p
## [1] 0.001
##
## $alternative
## [1] "two.sided"
# (2) Mantel test using "vegan" library:
mantel(corModulTrait_66, virusAbundCor_66, method="pearson", permutations=1000)
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = corModulTrait_66, ydis = virusAbundCor_66, method = "pearson", permutations = 1000)
##
## Mantel statistic r: 0.6044
## Significance: 0.000999
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.133 0.179 0.224 0.267
## Permutation: free
## Number of permutations: 1000
#plot the correlation
#sizeGrWindow(9, 5)
verboseScatterplot(x = corModulTrait_66, y = virusAbundCor_66, main = "Correlation between Fig 2a, \n virus-virus interaction; and Fig 2b, \n varroa-virus interaction", xlab = "Correlation of viral interaction \nwith varroa modules", ylab = "Correlation of viral abundance", abline = T, abline.color = "black", bg = "black", cex.lab = 1.2, cex.main = 1, cex.axis = 1)
Figure 2c. Intra-viral interactions can predict virus - vector interactions. Correlation model between viral load correlations (fig 2a), and the distance-matrix of the module-virus correlations (fig 2b). For analysis in figure c, Mantel test for correlation between two matrices was conducted using 1,000 permutations.
In figure 2c, X-axis: how virus interacts with varroa expression, and Y-axis: the correlation of viral abundance across samples.
We found a significant positive correlation between the two distance matrices (Mantel-test for correlation between two distance-matrices (Mantel 1967), Mantel statistic r = 0.604, p = 0.001) (Fig 2c). Meaning, given viruses’ interaction we can predict how these viruses will interact with the vector’s module.